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ABSTRACT 



A natural language (NL) analyzing system is provided 
with the capability to analyze NL expressions and to 
resolve ambiguities and present them to the user for 
verification of correct interpretation. A conceptual 
model of the system, relevant to the application in 
which the invention is implemented, is created (custom- 
izing the system) by the user, and is stored as a concep- 
tual schema. The schema is built of logical facts repre- 
senting entities (concepts) and relationships between 
entities, forming a description of the universe of dis- 
course or object system in question. The entities of the 
schema have at least one external connection, namely to 
natural language terms in a vocabulary. The schema 
itself is completely language independent, though the 
components of it may have "names" expressed in a 
natural language such as English. There may be a sec- 
ond connection to the entities, namely where the system 
is used in a query system for relational data bases. In this 
case the entities of the schema represent objects in the 
data base, and thus there is a connection between the 
entities and those objects of the data base. The actual 
analysis of NL expressions is performed by a natural 
language engine (NLE) in cooperation with an analysis 
grammar and the schema. The analysis results in an 
intermediate, language-independent logic form repre- 
sentation of the input, which is paraphrased back to NL 
for verification. If the input is a query, there is a transla- 
tion into a query language such as SQL. 

11 Claims, 5 Drawing Sheets 
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1 2 

translation is performed in two steps by first translating 

NATURAL LANGUAGE ANALYZING APPARATUS the input sentence into an intermediate language, pref- 

AND METHOD erably artificial, and then translating the intermediate 

language into the target language. 

This is a continuation of application Ser. No. 5 European Patent Application EP-0168814 (NEC 

07/485,917,. filed Feb. 27, 1990, now abandoned. Corporation) discloses a language processing dictionary 

BACKGROUND OF THE INVENTION f ° r bidi . rectionall y retrieving morphemic and semantic 

expressions. It comprises a retrieving arrangement 

The present invention relates in general to the use of which is operable like a digital computer, and the dictio- 
natural language for communication with computers, 10 nary itself is comprised of elementary dictionaries, 
and in particular to querying data bases, e.g. relational namely a morphemic, a semantic and a conceptual die- 
data bases, or to translation between two natural lan- tionary. Each morphemic and conceptual item in the 
guages of application specific texts. corresponding dictionaries are associated with pointers 

There is a widely recognized demand in the computer to a set of syntactical dictionary items. The syntactical 

world for user friendly interfaces for computers. Nu- 15 items are associated with two pointers to a set of mor- 

merous attempts have been made in order to achieve phemic and a set of conceptual items, 

this with various results. U.S. Pat. No. 4,688, 195 (Thompson et al, assigned to 

The simplest way of creating programs that are possi- Texas Instruments) discloses a natural language inter- 

ble to use without having particular skills is to design face generating system. It generates a natural language 

menu based systems where the user selects functions 20 menu interface which provides a menu selection tech- 

from a panel with several options. nique particularly suitable for the unskilled user. 

^ Another way is to make use of screens with symbols However, none of the above listed patents fully ad- 

("icons") and letting the user select from the screen by dress the problem solved by the present invention, al- 

pointing at the selected symbol with a light-pen, or by though they do present alternative technical solutions 

moving a cursor by means of a so called "mouse", point- 25 to certain features, 
ing at the desired symbol, and then pressing a button for 

activating the function. SUMMARY OF THE INVENTION 

These methods have severe limitations in many appli- It is an object of the present invention to provide a 
cations where great flexibility in selection is desired, device and a method by means of which a user can 
since such systems must be predefined, and unexpected 30 formulate input expressions in a selected natural lan- 
or new desires require programming of the system guage in reasonably random fashion, which expressions 
a S ain * are interpreted lexically, syntactically, and semantically 
The need for flexibility is especially important for by means of dictionaries and analysis grammars, and 
data retrieval from data bases. In order to make which are disambiguated and transformed into an inter- 
searches in data bases, often complex query languages 35 mediate representation form. 

must be used, requiring high skill. If reports are to be This intermediate representation form can then be 

created from the retrieved data, further processing must used for creating queries in a specific query language 

be carried out. In addition, several successive queries (such as SQL for a relational data base) and/or for 

may have to be entered before the end result is arrived paraphrasing or "play-back" of the input for verifica- 

at * 40 tion of the correctness of the machine's interpretation of 

An example of a query language is SQL (Structured the input expression or query. For this purpose there is 

Query Language; IBM program no. 5748-XXJ). This is provided a natural language generator including a gen- 

widely used but due to its complexity it is not possible eration grammar. 

for the average user to learn it satisfactorily, instead If the generation grammar is for another natural lan- 
there are specialists available for creating SQL query 45 guage than that of the analysis grammar, the latter func- 
strings that can be implemented as commands for tion can also be used for pure translation into another 
searches of a routine nature. The specialist must be language. 

consulted every time a new kind of query is to be made. According to the invention, a natural language ana- 
There have been numerous attempts to remedy such lyzing apparatus, has means for inputting sentences or 
deficiencies by trying to create interfaces to data bases 50 expressions in natural language. Analysis grammar 
which can interpret a query formulated in natural lan- means analyzes the natural language in question. A 
guage. However, practically every such attempt has vocabulary module contains definitions of terms of the 
been based on key word identification in the input query natural language in question. Parsing means identifies 
strings. This inevitably leads to ambiguities in the inter- the input sentence or expression as being grammatical 
pretation in many cases. 55 and generates one or more parses, if there is one or more 
Rather recently, research in the artificial intelligence possible interpretations of the expression, in coopera- 
area has led to systems where lexical, syntactical, and tion with the analysis grammar means and the vocabu- 
semantic analysis has been performed on input strings, lary module. 

utilizing grammars and dictionaries, mainly for pure The invention also has a conceptual schema stored in 

translation purposes. It seems as if these systems are 60 the system. The conceptual schema contains entities (el, 

successful only to a certain extent, in that there is a ... en) and relationships between the entities so as to 

relatively high rate of misinterpretations, resulting in form a description of a relevant universe of discourse 

incorrect translations. This frequently leads to the re- The entities and relationships of the universe of dis- 

quirement of editing the result. course are linked to corresponding natural language 

United Kingdom Patent Application 2,096,374 (Mar- 65 terms (tl) in the vocabulary, 

coni Company) discloses a translating device for the Generator means comprise callable semantic routines 

automatic translation of one language into another. It for generating an intermediate representation of the 

comprises word and syntax analysis means, and the natural language input As a result, the semantic rou- 
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tines access the conceptual schema for checking that the 
input expression is valid in the relevant universe of 
discourse. 

An important feature of the invention is the concep- 
tual schema connected to natural language terms in a 
vocabulary. The schema is completely language inde- 
pendent, and contains only concepts (entities) and rela- 
tions between concepts. 

By using such a schema, which is a model of a rele- 
vant so called Universe of Discourse* (or object system, 
which is a collection of abstract or concrete things and 
information about these things, to which the natural 
language expression to be analyzed, is relevant), it is 
possible to obtain complete resolution of ambiguities, as 
long as the input expression is in reasonable agreement 
with the Universe of Discourse. This has not been possi- 
ble previously. 

Since the schema is language independent there is a 
great advantage in that it is very easy to change analysis 



10 



IS 



A data base query system incorporating the invention 
thus has a query interface 1 comprising input means 2 
that can have any suitable form for transforming char- 
acter strings into digital signals, e.g. a keyboard of stan- 
dard type. It is also conceivable that the input query is 
made by speech, in which case the input means would 
comprise a microphone and sound analyzing means. 

There may also be present a display means 3 for pres- 
enting results of queries, results of parsing (paraphrased 
queries, to be described later), and also for displaying, 
e.g. help panels. 

The core of the system is the natural language engine 
NLE 4. It comprises a natural language analyzer 5 
which includes a parser and which is used for the actual 
syntax analysis. The analyzer 5 makes use of base dictio- 
nary 6 and application dictionary 7 and an analysis 
grammar 8 to perform the actual parsing of the input (to 
be described in more detail later). 
The system further comprises a data base (DB) and a 



— - — o — -a -vwutj^ww *» wiw uiwb auu a 

grammar and vocabulary, and thus to switch between zu data base manager 9. It will not be described in detail 

since one skilled in the art readily recognizes the neces- 
sary design of such a device. 

An essential feature of the invention is a model of the 
data base in the form of a conceptual schema (base 
conceptual model 10 and application conceptual Model 
11), which may be created by the user. 

The concept of a conceptual schema is described in 
the literature in the field of artificial intelligence, (see 
e.g. "Konceptuell Modellering" by J. Bubenko et al). 
Briefly, a conceptual model consists of 

1) 'Entities', which are any concrete or abstract 
thing/things of interest; 

2) 'Relationships* which are associations between 
entities; 

3) Terms* which are natural language expressions 
that refers to entities; 

4) 'Database Representations* which are mappings of 
entities into the database; and 

5) 'Database Information' comprising 'Referential 
Integrity' and 'Key*. 

As many entities as the user finds necessary may be 
defined, and the system will automatically suggest that 
every table in the data base is associated with an entity. 

Entities of the model may be connected or linked to 
each other by one or several relationships. In general 
relationships fall into the following categories: 



25 



30 



35 



40 



different natural languages. In fact grammars and dic- 
tionaries can be supplied as 'plug-in* modules. 

In a preferred embodiment of the invention the 
schema is also connected to the contents of a relational 
data base. That is, each concept of the schema may or 
may not have a unique connection to a table containing 
objects relating to that concept 

Thus, the schema constitutes a link between natural 
language and the data base. If thus the input expression 
is a query to the data base, the analysis will produce an 
interpretation of the query which then is translated into 
the query language for that data base (e.g. SQL). 

In another embodiment, queries are paraphrased, i.e. 
if a query is ambiguous, two or more paraphrases are 
presented to the user, for him to select the correct one. 
Thereby one achieves that a 100% correct query is 
made to the data base. 

In a further embodiment the paraphrasing function is 
used for pure translation. Thereby a generation gram- 
mar and a vocabulary for a second language is used 
when paraphrasing the input expression. In this case 
there is no use of a data base in the sense of the previ- 
ously mentioned embodiment 

BRIEF DESCRIPTION OF THE DRAWING 45 

FIG. 1 is a conceptual overview of a system compris- 
ing the natural language analyzing device according to 
the present invention, as implemented for querying a 
relational data base. 50 

FIG. 2A is a schematic illustration of a simple exam- 
ple of a conceptual schema, modelling the data base 
contents, and which can be used with the invention. 

FIG. 2B is a simplified illustration of how parts of the 
schema of FIG. 2A is linked to tables in a data base and 55 
to natural language terms in a vocabulary. 

FIG. 3A is an example of a parse tree (or syntax tree) 
created during parsing. 

FIG. 3B is a graphic illustration of a semantic tree 
built by the parser. 

FIG. 4 is an illustration of the screen of the graphic 
interface of the Customizing Tool. 

DESCRIPTION OF THE PREFERRED 
EMBODIMENTS 

With reference now to FIG. 1, the general layout and 
design of a system for querying a data base comprising 
the invention will be given. 



'is on instance of 
'identifies' 
'is named* 
'is a subtype of 
'is counted by' 
is measured by' 



60 



'subject' 
'direct object* 
'dative object' 
'preposition' 
'adverbial of place* 
'adverbial of time' 



(etc) 



65 



The 'subtype' relationship is a hierarchical relation- 
ship and is treated separately from the other non- 
hierarchical relationships. Most of the above relation- 
ships are self-evident as to their meaning, but for clarity 
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a few examples will be given with reference to FIGS. 
2A and 26. 



Entity 


Relationship 


Entity 


continent (CNTNNT) 
country (CNTRY) 
producer (PRDCR) 
export (EXPORT) 


'is identified by' 
'possesses' 
'is subtype of 
'has object* 


continent id (ID) 
capital (CPTL) 
country (CNTRY) 
product (PRDCT) 



the concepts may have been assigned "names" ex- 
pressed in a natural language, e.g. English. 

The model as shown in FIG. 2A, is stored as a set of 
logical facts: 

EXAMPLE I 



(in this last example the entity EXPORT has no link to a table in the data base). 



10 



Entities of the model are connected to natural lan- 
guage terms by the user, apart from a base collection of 
terms common to all applications (e.g. list, show, who, 
what, which, is, more etc.). Such terms are members of 
a base dictionary which is part of the system initially. It 15 
should be noted that an entity may be associated with 
zero, one or more natural language terms of the same 
category. The same term can also be associated with 
more than one entity. 

Returning to FIG. 1, the actual building of the 20 
schema, comprising connecting it to the natural lan- 
guage terms and to tables of the data base, is performed 
with a customization tool (CT) 12 (described later). The 
"SRPI" boxes denote what one might call a communi- 
cation protocol, necessary for communication with the 25 
host, for accessing the data base during customization 
(SRPI = server requester programming interface). 

The way in which the conceptual schema is used to 
form a natural language interface to a data base or for 
translation purposes by connecting it to natural lan- 30 
guage terms has not been previously disclosed. 

With reference now to FIGS. 2A and 2B, an example 
of how the conceptual schema is implemented within 
the scope of the invention will be given. In the example, 
a relational data base with tables containing information 35 
about a number of countries, is assumed as the informa- 
tion containing system. 

As can be seen in FIG. 2B, the first table TABLE. CO 
contains three columns the contents of which relate to 
countries. One column lists countries, a second lists the 40 
capitals of the countries, and the third lists the continent 
to which the countries belong in terms of a continent 
identity number. 

The second table TABLE.EXPORT lists in the first 
column the names of producer countries that export 45 
various products, and the second column lists which 
products each country in fact exports. 

Finally the third table TABLE.CNT lists relevant 
continents in one column and a continent identity num- 
ber in a second column. SO 

The conceptual schema (FIG. 2A) is created during 
customization (to be described) and it represents a 
model which describes the collection of all objects in 
the information system, all facts about the system which 
are of interest to the users, and the relations between the 55 
objects and facts. In other words, it is a model of the 
universe of discourse (or object system) which is a se- 
lected portion of the real world, or a postulated world 
dealt with in the system in question. 

The conceptual schema comprises entities (concepts), 60 
in the examples denoted as en, where n is an integer, and 
relationships (links) between these entities (concepts). 
The schema has two types of external connections, one 
to the natural language terms (as expressed by natural 
language vocabulary), and one to the data base (see 65 
EXAMPLES II and IV, below). 

It is very important to recognize that the schema 
itself is language-independent, even though of course 



po$sesses(e2, el). 




pos$es$es(e2, e5). 




po$$esses(e5, e2). 




nom(e6, e3). 


(e6 has-subject e8) 


acc(e6, e8). 


(e6 has-object eS) 


subtype(e3, e2). 




subtype(e4, elO). 




subtype^, e7). 




identifies^, e5). 


(c4 identifies e5) 


identifiesfclQ, e7). 




namefcll, e7). 




ip<*2. c5). 


("location of place"; e2 is-in e5) 



When customizing the system, the terms likely to be 
used by the users must be defined. The task of vocabu- 
lary definition includes connecting natural language 
terms to the entities in the schema and providing mor- 
phological information on them. 

For the data base in our example, the following terms 
may be defined (the en's are entities in the schema, and 
the tn's denote the terms, where n is an integer): 



EXAMPLE II 



(el) 


> 


'capital' (tl) noun, plural: 
'capitals', pronoun: "if 


(e2) 


> 


'country' (t2) noun, plural: 
'countries', prounoun: 'it' 


(c7) 


> 


'continent' (t3) noun, plural: 
'continents', pronoun: *it* 


(e8) 


> 


'product* (t4) noun, plural: 
'products*, pronoun: *it* 


(e6) 


> 


"export* (t5) verb, forms: 
'exports', 'exported*, 'exported*, 
'exporting* 


<e6) 


> 


'produce* (t6) verb, forms: 
'produces', 'produced*, 'produced*, 










'producing' 



As can be seen the entity e6 has two different natural 
language terms connected to it, namely 'export' and 
'produce* This signifies that in the object system of the 
data base, 'export' and 'produce' are synonyms. 

The opposite situation could occur as well, e.g. the 
word 'export' could have the meaning of "the exported 
products" or it could mean the verb "to sell abroad". In 
this case clearly the same word relates to two different 
concepts (homonyms). 

The customizer can define nouns, verbs and adjec- 
tives and connect them to the entities. Note that one 
entity may be connected to zero, one or several terms in 
natural language, and that the same term may be con- 
nected to more than one entity (concept). 

The above definitions are stored as logical facts as a 
part of the conceptual schema (cf. EXAMPLE II): 



EXAMPLE III 



image(el, tl). 
image(e2, t2). 
image(e7, t3). 
image(e8, t4). 
image(e6, t5). 
image(e6, t€). 
category(tl, noun). 
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category(t2 f noun). previously mentioned logical facts (see EXAMPLES II 

category(t3, noun). and III): 

category(t4, noun). 

category(t5, verb). EXAMPLE V 

category(t6, verb). 5 db(e2, set(Vl , relation(table.co(Vl=cntry)))). 

term(tl , /capital*) db(el t set(Vl , relation(table.co(Vl==cptl)))). 

term(t2, 'country') db(e3, set(Vl , relation(table.export(Vl^prdcr)))). 

term(t3, 'continent*) db(e8, set(Vl, relation(table.export(Vl=prdct)))). 

term(t4, 'product*) db(ell, set(Vl , relation(table.cnt(Vl=cntnnt)))). 

term(t5, 'export*) 10 db(e4, set(Vl, relation(table.co(Vl=cnt_id)))) 

term(t6, 'produce') db(el0, set(Vl , relation(table.cnt(Vl=id)))). 

syntax(tl , 'capitalVcapitals'.'i'.nil). Here *W indicates the data base link t and "relation" 

syntax(t2, 'country*, "countries'. *i'.nil). shows the connection between an entity and the corre- 

syntax(t3, 'continentYcontinentsVi'.nil). spending column of a table. 

syntax(t4, 'product'. 'products'.'i'.nil). 15 Thus, the conceptual schema consists of a collection 

syntax(t5, 'exportVexportsVexported'. 'exported Vex- of logical facts of the types according to EXAMPLES 

porting'.nil). II, III, and V. Other types could also be conceived. 

syntax(t6, 'produce'. 'produces'. produced*, 'produ- In the following, the translation of a natural language 

ced'.'producing'.nil). query into SQL will be described. 

As can be seen, this collection of facts describes the 20 Parsing is the first step in processing a natural lan- 
link between the terms and the conceptual schema ("im- guage query. The parser in the natural language analy- 
age( . . . )"), the grammatical class of terms ("category( zer 5 (FIG. 1) scans the input string character by char- 
. . . )"), the actual natural language word used for the acter and finds, by using dictionary entries and gram- 
term C'term( . . . )"), and the syntax ("syntax( . . . )") mar rules (syntactic rules) in the analysis grammar 8, all 
relevant to the term in the language in question (English 25 possible combinations of patterns which'are grammati- 
in this case). cal. Parsing techniques are well known in the art and 

Thus, these expressions define how the terms (tn, will not be discussed in detail. (See, for example, Euro- 
where n is an integer) are related to the entities in the pean Patent 91317 (Amano and Hirakawa)). 
schema and what their grammatical classes are. The parser produces, as one of its outputs, a single 

Dictionary entries are also created during the vocab- 30 parse tree (or syntax tree), or several parse trees (FIG. 
ulary definition. For example, the dictionary entry for 3A) if the query is ambiguous, describing how dictio- 
the verb 'export' looks like this: nary look-ups and application of syntactic rules resulted 

in recognition of an input string as being grammatical. 
VC ^^^ZT ityP ~^ S ~ lMvtT ^ For example the query 'who exports all products' will 

^ 35 generate the parse tree shown in FIG. 3A. (Other exam- 

In order to relate natural language queries to the ^f A q ^ mtermediate ™* &»■ structures 

rektional data base, it is necessary to link or connect m * e parse f e ? vtn Appendix.) 

concepts of the model (i.e. the schema itself) to the data , ^ 3A > th< T t0p ° f * e reads 

oase J (sent) uidicating that the input stnng was identified as a 

Not all concepts are related to the data base, but there 40 S^* 1 ??^ A11 ^ OIlnectio r ns be j tween twitches and 
can only be one data base link for a specific concept. Of ^ ° f th \ br ^ ches ■« referred to as nodes, having 
course several different links may be introduced if nec- "^^^J^ ^ , 
essary, through definition of new concepts. , ™ e of these idenUfiers are mostly evident 

The links or connections between entities (or con- « (e * ' ? OUn) ' H ° w ? ver > < n P> demotes a 'nominal 
cepts) in the schema to the data base is made via SQL 45 P hr f7 l W> T*^ ^ onstruct ^ mvalent t0 a 

expressions: verbal phrase'), and (sc) is a 'sentence construct' mean- 

ing a grammatically valid clause (not necessarily a com- 
EXAMPLE IV plete sentence). 

(e2>-SELECT CNTRY FROM TABLE CO n Further, every syntactic rule (grammar rule) is asso- 

(el>->SELECT CPTL FROM TABLE CO ciated with zero, one or more semantic routines (execut- 

(e3)-*SELECT PRDCR FROM TABLE.EXPORT ab * e ^°8 nms >' ■Pjncr produces as a second 

(e8)-*SELECT PRDCT FROM TABLE.EXPORT ° Ut £ a semantlc (FIG. 3B) in association with 
(ell)-*SELECT CNTNNT FROM TABLE.CNT each syntax tree. 

Two examples of grammar rules are given below: 



<SENT:l:FPE-COMMAND(l,2)> <-<SC:TYP=AZ,+IMP,+CMD,(SYST= »)- 
!(SYST=2)< >NP:+ ACC> 

<SCT:],+ES,-|-CN:FPE.NOM(2,l)><.<VC:TYP t =N2,-|-CNA.COL=COL(2)_ 
-DS,-PPE,-IMP,-PAS,((-SG)&(^G<2)))!((.PL(2)))><NP: + NOM- 
-REL,-WPRO>; 



(e4)— SELECT CNT ID FROM TABLE.CO 
(elO)-+SELECT ID FROM TABLE.CNT 
The links to the data base can be very complicated 65 
SQL expressions. The information on such links is 
stored as the following logical facts and they too consti- 
tute a part of the conceptual schema together with the 



These rules are built in one of the many formalisms 
that exist (in this case ULG), and thus constitute mere 
examples of how they can be built. 

An argument of the syntactic rules may contain a call 
for or pointer to a semantic routine mentioned above, if 
appropriate, and for each rule that is activated and 
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contains such "pointer" or "call", a semantic routine is exports all products and by 'all products' the user can 

allocated, and a "semantic tree" is built (In the first of here only mean products appearing as data in the data- 

the given examples, the argument FPE-COM- base. 

MAND(1,2) is a call for a routine named COMMAND The CLF is then verified, completed, and disam- 

thus building a node named COMMAND; in the second 5 biguated by checking against the conceptual schema. If, 

example, the argument is a call for NOM.) f or example, the verb 'export' is defined in the coricep- 

The semantic trees are nested structures containing tual schema such that it may take subjects from two 

the semantic routines, and the trees form executable different entities, then two CLFs must be produced, 

programs, which produce an intermediate representa- one f or each case. On the other hand if there is no sub- 

uon form of the query when they are executed. io jec t for the verb 'export' in the model, the CLF must be 

This intermediate representation form of the original aborted 

«B<y preserves lie neuiiig of the query, is fer as Ihe la Hie .bove eomple, Ihe checking rajnst the model 

when expressed as an executable program: 15 

EXAMPLE VI EXAMPLE Vin 



r^/Xi report, 

25 SSSSd** 

nom(y3,yI))))). 



Here the p's are pointers to the internal structures 

created during parsing for the input query, and each line where the added information is that the user wants a list 

begins with the name of the routine called for in the of countries, 'country* (e2) being a supertype of the 

applied syntactic rule. 30 concept e3, 'producer'. 

After completion of the semantic tree the main pro- Contextual references are also resolved at this stage 
gram enters next loop in which the tree is "decom- where any reference to previous queries, either in the 
posed" into its nodes (each individual semantic routine form of a pronoun or fragment, is replaced by the ap- 
is a node), and the routines are executed from the bot- propriate CLF statements from those previous queries, 
torn and up, which will trigger execution of the nested 35 In order to verify the interpretation of the queries 
routines m the structure, with the user and let the user select the correct interpre- 

The semantic routines "use" the conceptual schema, tation among several alternatives generated by the in- 

and the information on the entities in the schema, for vention, the CLF (conceptual logic form) must be pres- 

checking that the information contents of the generated ented in natural language form as paraphrasings of the 

semantic tree corresponds to a valid relationship struc- 40 original query. 

ture within the universe of discourse defined by the To generate natural language from CLF, the CLF 

schema. Thus, the execution of these routines performs first is translated into a set of structures (trees) called 

a check of a language expression against the conceptual Initial Trees. These trees contain such information as 

schema to see if the expression is a valid one (within the what the focus or core of the query is, what concepts 

defined universe of discourse or object system). 45 are involved in the query, and what are the relationships 

By using the conceptual schema, the semantic rou- between them. The following set of Initial Trees will be 

tines generate a representation of the natural language generated for our example CLF: 

queries in a form called CLF (conceptual logical form). noun ((id=3).(group= l).(scope=nil).var=yl). (en- 

This is a first order predicate logic with set and aggre- tity =e3).(focus= l).nil). 

gate functions. (One of ordinary skill in the art can 50 noun((id=l).(group=l).(scope=nil).(var=y2). (en- 
design such representations in many different ways and tity=e8).(all= l).nil). 

still achieve the same object.) verb((id=2).(group= l).(scope==y2.nil).(var=y3). 

The CLF representation of the example query will (entity =e6).(acc=y2).(nom=yl),nil). 

be: The paraphrased version of our previous example 

EXAMPLE VII 55 query wi ? be <List ^ c 0011 ^ 65 that export all prod- 

ucts'. This paraphrased expression is presented to the 

________________________ user *° r verification. 

query( When the user has confirmed/selected the interpreta- 

re P° rt ' tion, the corresponding CLF is translated into an SQL- 

60 expression. This process involves two steps, namely a 
instancies, y2) - > translation of the CLF to a further intermediate repre- 

exist(y3, sentation form (data base oriented logical form; herein 

instance^ y 3) & referred to as DBLF). 

S2sSxJn»)) TWs fonn * similar t0 CLF (or My other ^ uiva " 
— — - ' 65 lent representation that is used), except that the entities 

. are replaced by their data base links from the concep- 

simply meaning that the user wants a report (as opposed tual schema (see Example IV). Thereby, the appropri- 

to a yes/no answer or a chart) of everything which ate connections between the SQL tables are established. 
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In our example, the following DBLF is generated 
from the corresponding CLF (see Example VIII): 

EXAMPLE EX 



query( 
report, 
s«(yl, 

relation(ublc.co(cntry = y 1)) & 
all(y2, 

relation(table.export(prdct=y2)) > 

rclatipn(tablc.cxport(prdcr = y2,cntry = y 1))))) 



12 



10 



The DBLF contains all information necessary to 
construct the SQL query. 

There is also an optimization of the queries by remov- 
ing redundant connections based on the information on 
the data base elicited during the customization. 

If the natural language query cannot be translated 
into one single SQL query, the DBLF will be translated 
into something beyond pure SQL, and this extension of 
SQL is called an Answer Set. An Answer Set has the 
following components: 

1) Temporary tables. A query like "How many 
countries are there in each continent" cannot be 
represented directly in SQL. To obtain the answer, 
a temporary table must be created, filled with data 
and then selected. 

The information to do this is part of the Answer Set. 

2) Range. There is no range concept in SQL. A query 
like "List the three highest mountains in the 
world" cannot be represented. The range specifica- 
tion in the Answer Set takes care of this and it is up 
to the program displaying the answer to the user to 
apply it. 

3) Report. The third part of the Answer Set is related 
to how the answer should be presented to the user. 
There may be three options: Report (default), 
Chart, or YES/NO. 

This makes it possible to handle queries like "Show 
me, in a bar chart, the sales figures for last month*'. 

For the above example query, the following struc- 
tures will be created: 

EXAMPLE X 



CREATE TABLE tl (entry , card) 
INSERT INTO tl (entry , card) 

SELECT xl.cntry, COUNT( DISTINCT xl.prdct ) 

FROM tablcexport xl GROUP BY xl. entry 
SELECT DISTINCT xl.cntry 

FROM tablcco x1,tl x3 

WHERE xLcntry = x3.cotry 

AND x3.card « ( 

SELECT COUNT( DISTINCT xZprdct ) 
FROM tablcexport x2) 
NIL 

REPORT 



Each query the user makes is automatically stored in 
a log. If the query is successful it is put in a Current Log, 
and if it fails it is put in an Error Log. 

A query in the Current Log may be copied into the 
input field of the main program. There the user can edit 
it before it is processed. The Answer Set stored with the 
query can directly be used to obtain the answer. 

The log can be stored and later reused by loading it 
into a Current Log. It can be viewed in a separate win- 
dow. Queries appearing in such windows may be copied 
into the input line and the Answer Set sent to obtain the 
answer. 

There is also provided a facility for creating the con- 
ceptual model and the vocabulary definition. This facil- 
ity is referred to as a Customization Tool. 

It is designed to be easy to use by providing a graphic 
interface (see FIG. 4), including an editing function, to 
the person performing the customization (the custom- 
izer). 

With this interface the following functions are avail- 
able: 

entities and relationships are presented as symbols 
(icons) 

the entities and relationships can be manipulated 
the current state of the model under construction is 
shown by highlighting the objects on the screen in 
different ways 
sets of objects can be clustered, for hiding complex 
structures in order to make the model more trans- 
parent 

The various entity icons 13 used in the graphic inter- 
face (see FIG. 4) can be, e.g. circles, ellipses, hexagons 
or triangles, whereby the shape is determined by the 
lexical category of terms referring to the entity in ques- 
tion. Each entity icon is annotated by the entity name. 

Relations or sets of relations between entities are 
represented by line segments (connector icons). 

A cluster icon represents a subset of the schema, and 
has the shape of a rectangle 14. 

A small diamond shaped icon (marker icon) is used to 
represent the current position in the schema. 

The graphic interface uses the select-then-act proto- 
col to manipulate entities and relationships. Below is 
45 given a brief description of the graphic interface. 

Preferably a mouse is used for ease of use, and a num- 
ber of options are selectable from various panels and 
action bars 16. For example *Create Entity' displays an 
entity icon in a selected vacant spot on the screen. It 
50 also 'opens' the entity for inputting definitions of the 
entity. 

The 'Create Connector* option is operable to create 
the relationship between two entities. With this option a 
line segment 15 connecting two previously defined 
entities is created. 

If there are many entities connected to one single 
'main' entity, a Cluster can be formed whereby only the 
selected 'main' entity is displayed, but with a different 
shape (e.g. a rectangle) to distinguish it from ordinary 
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which results in a temporary relation created as the 
SQL table Tl with the columns CNTRY and CARD. 

The column CNTRY is copied from the column 60 entity representations. 
CNTRY in the table TABLE.EXPORT and the values In a preferred embodiment implemented for a rela- 
in the. column CARD will be calculated as the number tional data base, the method comprises an initial step of 
of distinct products (PRDCT column in TABLE,EX- identifying the tables in the data base and defining the 
PORT) related to each country. relations between the tables. The system then automati- 
The final query is made against the Tl table and will 65 catly responds by suggesting a conceptual model corn- 
result in a list of countries which export as many prod- prising entities and relationships between these entities, 
ucts as the number of distinct products found in the data This model is presented to the user (the customizer) for 
base only France in this case. verification. 
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Thereafter, the customizer continues to interactively 
create entities and relationships in view of his/her 
knowledge of the system in question (e.g. a relational 
data base). 

The method also comprises linking the entities to 5 
natural language terms, and storing the terms in a dictio- 
nary. 

The entities are classified as belonging to any of a 
predefined set of types (person, place, event, process, 
time, identifier, name etc.), the types being stored. 10 

In addition it comprises creating the links to the data 
base by identifying which data base representation (e.g. 
in a subset of SQL; see EXAMPLE IV) the entities 
shall have. 

The whole model including entities, relationships, 15 
vocabulary and data base links is stored as (logical) 
facts. 

A still further aspect of the invention is that by keep- 
ing knowledge of the system in question and other in- 
formation used in the natural language analyzing appa- 20 
ratus in data base tables (such as SQL tables), users can 
use the method and apparatus of the invention to query 
that knowledge and thus request meta-knowledge. 

In this way there is no difference between ordinary 
queries and meta-knowledge queries, neither from the 25 
user's point of view nor from the system's. 

The conceptual schema for meta-knowledge is cre- 
ated in advance as a part of a base conceptual schema. 
Such a schema is application independent, and the ta- 
bles used for storing said schema are called with unique 30 
dummy names when customized. During CLF to 
DBLF translation (as previously described) when these 
dummy table names appear in the data base representa- 
tions, they are replaced with the correct table name 
corresponding to the current application. 35 

For example, the table where a list of all tables in- 
cluded in the application is kept can be called 'appl tabs* 
when the schema for meta-knowledge is created. Then, 
when a specific application *xyz* is run, the CLF to 
DBLF translator replaces *appl tabs* with *xyz tabs' in 40 
the data base representations. 

As mentioned previously the conceptual model 
(schema) is stored as (logical) facts. There are identifiers 
associated with these facts corresponding to the name 
of a relational data base table (cf EXAMPLE III where 45 
the identifiers are the 'prefixes': 'image', 'category', 
'term*, etc). 

In the process of creating meta-knowledge, when the 
person doing the customization ends a session, either 
having completed a model or terminating the modelling 50 
temporarily, these facts are automatically read from 
storage, the identifiers are recognized by the system, 
and the facts are stored in the empty, predefined tables 
(linked to the pre-created base conceptual schema). 
Note that the identifiers are not necessarily identical to 55 
the names of the tables; there may be conditions specify- 
ing that, e.g. the facts belonging to the identifier 'term' 
be put in a table labeled 'words*. 

The tables that subsequently are 'filled* with facts are 
then accessible for querying in the same way as ordi- 60 
nary data base tables, thus providing the desired meta- 
knowledge. 

APPENDIX 

In this appendix a few more examples of queries and 65 
the intermediate representations of the queries, and the 
final SQL is listed (note that the entire Answer Set is 
not given). 
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EXAMPLE 1 



'List the capitals of the countries* 
Semantic tree: 
command ( 
p85, 
gener< 
p37, 
MisteT 
npdefi( 
P«. 
'die', 
attgen( 
P*4, 
noroen( 
p62, 

'capital'), 
prep( 
p61, 
npdefi( 
p58, 
*die\ 
nomen( 
P 53, 

'country')). 
'PP\ 
gener( 

p47, 

'of))))) 

CLF: 

query(report,0, 
set(yl, 

instance(capitaLyl) & 
exist(y2 ( 

instance(country,y2) & 
possesses<y2,yl)))) 
DBLF: 

query(report,0 setiyl, 

relation((table.co(capita1 = yl, country : 
SQL: 

SELECT DISTINCT xl.capitaUl.country FROM tablcco xl 



' y2))»» 



EXAMPLE 2 



*what does England export* 
CLF: 

query(report,0 set(yl, 
instance(product,yl) & exist(y2. 
instancc<provider,y2) & 
narae(y2,*great britain*) & 
exist(y3, 
instance(export,y3) & 
nam(y3,y2) & 
acc(y3,yl))))) 
DBLF: 

query(report,0 set(yl, 

relation(table.exportbase<country = 'great— 
britain'.product = yl)))) 

SQL: 

SELECT DISTINCT xl.product 
FROM tablcexportbase xl 
WHERE xl. country ~ 'great britain* 



EXAMPLE 3 



'What are the populations of the ec •countries' 
CLF: 

query(rcport,0 «et(y2, 
instance(population,y2) & 
exist(y3, 

instance(ec-country,y3) & 
possesses(y3,y2)))) 
DBLF: 

querytreporttO set(y2, 
set(y3, 

relation(tab!e.co(popuUtion =3 y2)) & 
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relation(table.orgbase(countxy «= y3,_ 
organization = 'EC))))) 

SQL: 

SELECT DISTINCT xl.populatioajclcountry 
FROM tablcco xLiablcorgbasc x2 
WHERE ^organization » 'EC 
AND x2.country = xUcountry 

We claim: 

1. A natural language analyzing apparatus compris- 
ing: 

a data base store comprising a data base containing 
tables; 

a grammar store comprising a grammar for a natural 
language comprising a set of language dependent 15 
syntax rules for the natural language, at least one 
syntax rule having one or more associated semantic 
routines; 

a vocabulary store comprising a vocabulary contain- 
ing terms of the natural language, definitions of the 
terms, and morphological information about the 
terms; 

a conceptual model store comprising a conceptual 
model having (i) a set of language independent 
records of information defining entities, each entity 
having a connection to at least one term in the 25 
vocabulary, at least one entity having a connection 
to the data base tables, and each term in the vocab- 
ulary being defined by at least one entity, and (ii) a 
set of records identifying relationships between 
different entities; 

means for inputting a series of words based in the 
natural language; 

parsing means for generating one or more syntacti- 
cally valid parse trees for the input series of words 
based on the vocabulary and the syntax rules, and 35 
for building, for each parse tree, an executable set 
of semantic routines based on one or more semantic 
routines associated with one or more of the syntax 
rules used to generate the parse tree; 

generator means for executing the set of semantic 
routines generated by the parser to create a lan- 
guage independent representation of the input se- 
ries of words, wherein executing the semantic rou- 
tine comprises checking groups of one or more 
words in the parse trees against the conceptual 
model for conceptual validity; 

output means for producing, from said language inde- 
pendent representation of the input series of words, 
a natural language output series of words in the 
same language as the input series of words, said 
output series of words representing a paraphrase of 50 
the input series of words; 

confirmation means for requesting confirmation of 
the conceptual accuracy of the output series of 
words with reject to the input series of words and 
for receiving a confirmation from the user if the 
user determines that the output series of words 
conceptually matches the input series of words; 
and 

query generator means responsive to the confirma- 
tion for producing a data base query from the lan- 
guage independent representation. 

2. An apparatus as claimed in claim 1, wherein the 
data base is a relational data base. 

3. The apparatus as claimed in claim 2, further com- 
prising means responsive to the query generating means 
for producing a response to the data base query. 

4. An apparatus as claimed in claim 3, comprising 
means for storing a previous query, and means for stor- 
ing for the previous query a corresponding answer set, 
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said answer set comprising a query statement, a specifi- 
cation of how much of the data in the data base tables is 
to be presented to a user of the apparatus, and informa- 
tion on a mode of presentation of the data. 

5. A method of querying an information store com- 
prising the steps of: 
generating a computer-readable input representing an 

input series of words in a natural language; 
parsing an input expression to generate one or more 
syntactically valid parse trees for the input series of 
words based on a vocabulary containing terms of 
the natural language, definitions of the terms, and 
morphological information about the terms, and 
based on syntax rules of a grammar for the natural 
language, at least one syntax rule having one or 
more associated semantic routines; 
building, for each parse tree, an executable set of 
semantic routines based on the one or more seman- 
tic routines associates with the at least one syntax 
rules having one or more associated semantic rou- 
tines and used to generate the parse tree; 
executing the executable set of semantic routines to 
create a language independent representation of 
the input series of words based on a conceptual 
model having (i) a set of language independent 
records of information defining entities, each entity 
having a connection to at least one term in the 
vocabulary, at least one entity having a connection 
to a data base table, and each term in the vocabu- 
lary being defined by at least one entity, and (ii) a 
set of records identifying relationships between 
different entities, wherein executing the semantic 
routine comprises checking groups of one or more 
words in the parse trees against the conceptual 
model for conceptual validity; 
producing, from said language independent represen- 
tation of the input series of words, a natural lan- 
guage output series of words in the same language 
as the input series of words, said output series of 
words representing a paraphrase of the input series 
of words; 

requesting confirmation of the conceptual accuracy 
of the output series of words with respect to the 
input series of words, and producing a confirma- 
tion if the output series of words is conceptually 
correct; and 

if the confirmation is produced, producing, in re- 
sponse to the confirmation, a data base query from 
the language independent representation. 

6. The method as claimed in claim 5, further compris- 
ing querying a data base with the data base query. 

7. A method as claimed in claim 6, wherein if an 
answer cannot directly be retrieved from-the data base 
tables in one single query statement, temporary tables 
are created and filed with data, said temporary tables 
being queried for a final answer. 

8. A method as claimed in claim 7, wherein the data 
is ordered by ascending or descending value; of the 
data. 

9. A method as claimed in claim 8, wherein only a 
selected portion of the data is presented to a user of the 
method. 

10. A method as claimed in claim 9, wherein an an- 
swer set comprising an instruction to create and fill the 
temporary tables, together with the query and a range 
of data to be selected from the temporary tables, is 
stored in a log for later use. 

11. A method as claimed in claim 10, wherein the 
stored answer set is copied into an input field of a query 
panel of a query program. 

♦ * * * * 
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